combinatorial method
Comparing Efficiency of Expert Data Aggregation Methods
Kadenko, Sergii, Tsyganok, Vitaliy
Expert estimation of objects takes place when there are no benchmark values of object weights, but these weights still have to be defined. That is why it is problematic to define the efficiency of expert estimation methods. We propose to define efficiency of such methods based on stability of their results under perturbations of input data. We compare two modifications of combinatorial method of expert data aggregation (spanning tree enumeration). Using the example of these two methods, we illustrate two approaches to efficiency evaluation. The first approach is based on usage of real data, obtained through estimation of a set of model objects by a group of experts. The second approach is based on simulation of the whole expert examination cycle (including expert estimates). During evaluation of efficiency of the two listed modifications of combinatorial expert data aggregation method the simulation-based approach proved more robust and credible. Our experimental study confirms that if weights of spanning trees are taken into consideration, the results of combinatorial data aggregation method become more stable. So, weighted spanning tree enumeration method has an advantage over non-weighted method (and, consequently, over logarithmic least squares and row geometric mean methods).
Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs
Xu, Jiaming, Wu, Rui, Zhu, Kai, Hajek, Bruce, Srikant, R., Ying, Lei
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we propose three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.